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Abstract —The way data structures organize data is often a 
function of the sequence of past operations. The organization of 
data is referred to as the data structure’s state, and the sequence 
of past operations constitutes the data structure’s history. A data 
structure state can therefore be used as an oracle to derive 
information about its history. As a result, for history-sensitive 
applications, such as privacy in e-voting, incremental signature 
schemes, and regulatory compliant data retention; it is imperative 
to conceal historical information contained within data structure 
states. 

Data structure history can be hidden by making data struc¬ 
tures history independent. In this paper, we explore how to 
achieve history independence. 

We observe that current history independence notions are 
significantly limited in number and scope. There are two existing 
notions of history independence - weak history independence 
(WHI) and strong history Independence (SHI). WHI does not 
protect against insider adversaries and SHI mandates canonical 
representations, resulting in inefficiency. 

We postulate the need for a broad, encompassing notion of 
history independence, which can capture WHI, SHI, and a broad 
spectrum of new history Independence notions. To this end, we 
introduce Ahistory Independence (AHI), a generic game-based 
framework that is malleable enough to accommodate existing 
and new history independence notions. 

As an essential step towards formalizing AHI, we explore the 
concepts of abstract data types, data structures, machine models, 
memory representations and history independence. Finally, to 
bridge the gap between theory and practice, we outline a general 
recipe for building end-to-end, history independent systems and 
demonstrate the use of the recipe in designing two history 
independent file systems. 

Index Terms —History Independence, data structures, regula¬ 
tory compliance 

I. Introduction 

Data structures are commonly used constructs to store and 
retrieve data in systems. However, data structures carry more 
information than the raw data they organize. One aspect of 
this information is the history leading to the data structure’s 
current state [1]. 

Concealing historical information contained within data 
structure states is necessary for incremental signature schemes 
[2] and for privacy in voting systems [2], [3], [4], [5]. 
Therefore, the need arises for data structures that reveal no 
information about the history that led to their current state 
other than what is inherently visible from the data. History 
independence [6] has been devised to enable the design of such 
data structures and they are termed as “history independent 
data structures". 

We have identihed the role of history independence in 
designing systems that are compliant with data retention 
regulations [7]. Retention regulations desire that once data 
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is deleted, no evidence about the past existence of deleted 
data must be recoverable. Such a deletion cannot be achieved 
by simply overwriting data as in secure deletion [8]. This 
is because overwriting does not eliminate the effects that 
previous existence of delete data leaves on the current system 
state. Even after secure deletion, the current state can be used 
as an oracle to derive information about the past existence of 
deleted records. For example, the current organization of data 
blocks on disk is a function of the sequence of previous writes 
to file system or to database search indexes. The organization 
could be different depending on whether a particular record 
was deleted in the past or was never inserted in the data set. 
Therefore, questions about history, such as “was John’s record 
ever in the HIV patients’ dataset” can be answered much more 
accurately than guessing by simply looking at the search index 
organization on disk since the organization could be different 
depending on whether John has previously been in the data set 
or not. The inference of past existence of deleted data violates 
data retention regulations. 

However, in order to architect systems with history indepen¬ 
dent characteristics and to prove history independence, we 
need a formal notion of data structures, of data structure 
states, and of history independence itself. In this paper we 
first formalize all necessary concepts and understand history 
independence from a theoretical perspective (Sections III - V). 
Then, in Section VIII and IX, we use the theoretical results 
to design two history independent file systems. 

II. A Quick Informal Look at HI 

History independence (HI) is concerned with the historical 
information preserved within data structure states. The pre¬ 
served history may be illicitly used by adversaries to violate 
regulatory compliance. For example, an adversary may breach 
data retention laws by recovering deleted data. Therefore, to 
understand history independence, we need to specify what 
we mean by state, what we mean by history, and what an 
adversary can do. 

What is state? 

A data structure’s state is an organization of data on a physical 
medium such as memory or disk. 

What is history? 

History is the sequence of operations that led to the current 
data structure state. 

What is the threat? 

For many existing data structures, the current state is a function 
of both data and history [1]. Hence, by analyzing the current 
state an adversary can derive the state’s history. Depending 
on the application the historical information includes the 
following: 

• Evidence of past existence of deleted data [10]. 


2 


Execution 

sequence 


Hash Table 


Execution 

sequence 


(a) 


insert{3) | 31 | 

insert{6) | 3| 6| 
insert{9) | 31 6 | 9 


Hash Table 
insert(6) | 61 | | 

insert(9) | 6 | 9 | | 

insert(3) | 6 | 9 | 3 | 

(b) 


Fig. 1. A history dependent hash table organizes the same data set differently 
depending on the sequence of operations (i.e., history). In this example, Ihe 
hash table uses linear probing [9]. The number of hash table buckets is 3 and 
the hash function is modulo 3. 


• The order in which votes were cast in a voting application 
[2], [3], 

• Intermediate versions of published documents [2]. 

To illustrate, consider the sample hash table data structure 
of Figure 1. The sample hash table organizes the same data 
set differently depending on the sequence of operations used. 
Hence, an adversary that looks at the system memory can 
potentially detect which operation sequence was used to get 
to the current hash table state. 

What is history independence? 

History independence is a characteristic of a data structure. 
A data structure is said to be history independent if from 
the adversary’s point of view, the current data structure state 
is a function of data only and not of history. Thus, the 
current state of a history independent data structure reveals no 
information to the adversary about its history other than what 
is inherently visible from the data itself. We emphasize that 
history independence is concerned with historical information 
that is revealed from data organization and not from the data. 

Are there different kinds of history independence? 

Naor et al. [11] introduced two notions of history indepen¬ 
dence - weak history independence (WHI) and strong history 
independence (SHI). 

WHI and SHI differ in the number of data structure states an 
adversary is permitted to observe. Under WHI, an adversary 
is permitted to observe only the current data structure state. 
For example, as in case of a stolen laptop. Under SHI, an 
adversary is permitted several observations of data structure 
states throughout a sequence of operations. For example, as 
in case of an insider adversary who can obtain a periodic 
memory dump. For SHI, the adversary should be unable to 
identify which sequence of operations was applied between 
any two adjacent observations. 

A. Our Contributions 

WHI assumes a weak adversary while SHI is a very powerful 
notion of history independence, secure even against a com¬ 
putationally unbounded adversary [1]. Currently, applications 
are restricted to using data structures with either WHI or 
SHI characteristics. However, applications that do not fit into 
either WHI or SHI do exist. For example, a journaling system 
that reveals no historical information other than the last k 


operations'. Further, WHI does not protect against insider 
adversaries and SHI results in inefficiency [12]. Hence, there 
is a necessity for new notions of history independence targeted 
towards specific application scenarios. 

In this paper we take the first steps towards better under¬ 
standing the history independence spectrum and its applicabil¬ 
ity to systems. The contributions of this paper are: 

• The exploration of abstract data types, data structures, 
machine models, and memory representations (Section 
III). This is an essential step towards formalizing history 
independence. 

• New game-based definitions of weak and strong history 
independence (Sections IV-A and IV-B) that are more 
appropriate for the security community as compared to 
existing terminology [11], [6]. 

• A new notion of history independence termed A his¬ 
tory independence (AHI). AHI centers around a generic 
game-based definition of history independence and is 
malleable enough to accommodate WHI, SHI, and a 
broad spectrum of new history independence notions 
(Section V-A). In addition, AHI helps to quantify the his¬ 
tory revealed or hidden by existing data structures most of 
which have been designed without history independence 
in mind. 

• A general recipe for designing history independent sys¬ 
tems and the recipe’s use in designing a history indepen¬ 
dent file system (Section VIII). 

• The design and evaluation of delete agnostic file system 
(DAFS). In DAFS, we re-design the file system layer to 
support new history independence notions. DAFS also 
increases file system resilience via journaling in the 
presence of history independence. 

III. Preliminaries 

Formalizing history independence requires an understanding 
of data structures. A data structure itself can be viewed as an 
implementation of an abstract data type (ADT) on a machine 
model [1]. An abstract data type (ADT) is a specification 
of operations for data organization while a machine model 
represents a physical computing machine. 

In the following, we provide an overview of ADTs, data 
structures, machine models, and memory representations as 
proposed in [1] that are relevant to history independence. Then, 
in Section IV we formalize history independence. 

A. Abstract Data Type (ADT) 

The specification of data organization techniques is often done 
via abstract data types. The key characteristic of an ADT 
is that it specifies operations independently of any specific 
implementation. We use the concept proposed by Golovin et 
al. [1], wherein an ADT is considered as a set of states together 
with a set of operations. Each operation maps the current state 
to a new state. 

Definition 1. Abstract Data Type (ADT) 

An ADT A is a pentuple (5, O, F, T*), where S is a set of 

*We give additional examples in Section V-A. 
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states; € S is the initial state; O is a set of operations; T 
is a set of inputs; 'F is a set of outputs; and each operation 
o G O is a function ^oiSxTo^Sx 'E'o, where To C F 
and 'I'o C 'I'. 

The ADT is initialized to state s^. When an operation o G O 
with input i G Fq is applied to an ADT state si, the ADT 
outputs T G T'o and transitions to a state S 2 - The transition 
from Si to Si is denoted as o(si,i) —>■ (s 2 ,t). 

The necessity of ADTs. History independence requires that 
from an adversary’s point of view, the current data structure 
state is a function of data only and not of history. In the 
context of history independence, an ADT models the history 
revealed by data only. Since we view a data structure as an 
ADT implementation (Section III-C), the ADT helps to clearly 
identify what the data structure is permitted or not permitted to 
reveal about past operations. Any history revealed by an ADT 
state can be revealed by the corresponding data structure state. 
Any history hidden by an ADT state must be hidden by the 
corresponding data structure state. 

ADT as a graph. We can imagine the ADT to be a directed 
graph Q, where each vertex represents an ADT state and each 
edge is labeled with an ADT operation along with an ADT 
input and an ADT output. The label for an edge between 
two vertices represents the operation that causes the transition 
between the corresponding states. We call the graph Q, the 
state transition graph of the ADT. 

Viewing an ADT as a graph will be particularly useful when 
we take a deeper look into history independence in Section 
III-D. 

B. Models of Execution 

An ADT is only a specification of operations for organizing 
data. For more practical use, such as for efficiency analysis, 
concrete implementations of the ADT operations are required. 
ADT implementations are provided via programs that 
can be executed on a given machine model. An ADT’s 
implementation in a given machine model is a data structure 
(Section III-C). 

RAM Model of Execution. The RAM model of execution 
models a traditional serial computer. The model consists of 
two components, a central processing unit (CPU) and a random 
access memory (RAM). Both the CPU and RAM are finite 
state machines (FSM) [13]. 

The RAM consists of m = 2“ storage locations. Each 
location is a 6 -bit word and has a unique log 2 m bit address 
associated with it^. Two operations are permitted on a storage 
location in the RAM. First, a load operation to access the 6 -bit 
bit word stored at the location. Second, a store operation that 
copies a given 6 -bit word to the location. Typically, the 6 -bit 
words are copied to or copied from CPU registers. 

The CPU consists of n 6 -bit registers and operates on a 
fetch-and-execute cycle [13]. The CPU has an associated set of 

^For brevity, we model each ADT operation with an input and an output. 
ADT operations may accept no inputs or produce no outputs. Hence, an ADT 
operation can also be modeled as the following functions: o : <S —> <S, 
o ; 5 —> 5 X tho, or o : 5 X To —>■ iS. 

^This a bounded-memory RAM. 


instructions that it can perform. CPU instructions are specified 
in a programming language. A program in a RAM model is 
a finite sequence of programming language instructions. 

A machine model can itself be considered as an ADT [1]. 
In this case, the set of ADT states is the set of all machine 
states, and the set of ADT operations is the set of all machine 
programs. For the RAM model, the set of ADT states, the 
set of inputs, and the set of outputs are all represented as bit 
strings. 

Definition 2. Bounded RAM Machine Model 
A bounded RAM machine model M with m b-bit memory 
words and n h-bit CPU registers is a pentuple (S, V, F, T*), 
where S = {0, l}*'(™+") is the set of machine states; G 5 is 
the initial state; V is the set of all programs of Ai; F = {0,1}* 
is a set of inputs; T* = { 0 , 1 }* is a set of outputs; and each 
program p G V is a function p : S x Tp ^ S x T'p, where 
Fp C F and T'p C fk. 

M is initialized to state s^. If a program p gV with input 
f G Fp is executed by the CPU when A4 is in state si, Ai 
outputs r G 5'p and transitions to a state S 2 - The transition 
from Si to S 2 is denoted as p(si,f) —>■ (s 2 ,r). 

C. Data Structure 

An implementation for an ADT in a given machine model is 
obtained as follows. 

• A machine representation is chosen for each ADT input 
and output. 

• For each ADT operation a machine program is selected 
that provides the functionality desired from the ADT 
operation. 

• A unique machine state is selected to represent the initial 
ADT state. 

We encapsulate the above steps in the following data structure 
definition. 

Definition 3. Data Structure 

A data structure implementation of an ADT A in a bounded 
RAM machine model Ai is a quadruple (a,/3,7, s^), 
where A = (5, s^, O, F, T') as per definition 1, Ai = 
(5^, s^, , F-^, ) as per definition 2, a : F' —>■ F-^, 

■. A!' ^ 7 : 0 ^ G S^, F' C F and 

T-' C T-. 

q: is a mapping from ADT inputs to machine inputs. That 
is, for any ADT input i, a{i) is the machine representation 
of the input. Similarly, is the mapping from ADT outputs 
to machine outputs. 7 is the mapping from ADT operations 
to machine programs. For an ADT operation o, 7 ( 0 ) is the 
machine program implementing o. Finally, just as the ADT A 
is initialized to a unique state s^, a unique machine state 
is selected to represent the initial data structure state. 

Data Structure State. A data structure state is a machine 
state. The set of all data structure states consists of all machine 
states that are reachable from the initial data structure state 
via execution of machine programs implementing the ADT 
operations. 

State Transition Graph For Data Structure. A data structure 
can be considered to be a directed graph Q, where each vertex 
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TABLE I 

Sample paths from ADT and data structure state transition 

GRAPHS. 


Path 

From 

Figure 

Pa = |1| |1) 3| 3i 6} 

2(a) 

p'a = >-{1,3,6| 

2(a) 

pj, = s-^ 3,1,_»^<< 3,1,6 >> 

2(b) 

P'-D = 6,1,_>>^<< 6,1,3 >> 

2(b) 


represents a data structure state and each edge is labeled with a 
machine program implementing an ADT operation along with 
a machine input and a machine output. The label for an edge 
between two vertices represents the program that causes the 
transition between the corresponding states. We call the graph 
Q, the state transition graph of the data structure. 

D. A Semi-Formal Look At HI 

The non-isomorphism problem. In Section II we introduced 
the two existing history independence notions - weak history 
independence (WHI) and strong history independence (SHI)"*. 

Non-isomorphism between the state transition graph of an 
ADT and of its data structure implementation breaks SHI. 
WHI on the other hand can be achieved even when the ADT 
and data structure state transition graphs are non-isomorphic. 
First, we look at how non-isomorphism breaks SHI and then 
we discuss how to achieve WHI in the presence of non¬ 
isomorphism. 

Why non-isomorphism breaks SHI? Non-isomorphism and 
thus the need for SHI arises when an ADT state has multiple 
memory representations We will precisely define memory 
representations for ADT states in Section III-E. For now, it 
suffices to say the following: A memory representation for an 
ADT state that is reachable from the initial ADT state via a 
sequence of ADT operations is the machine state reachable 
from the initial data structure state via the corresponding 
program sequence. For example, in Figure 2, the data structure 
states << 3,1,6 >> and << 6,1,3 >> are memory 
representations of the ADT state {1,3,6}. 

To illustrate how non-isomorphism breaks SHI, consider the 
example graphs from Figure 2, example paths from Table I, 
and an adversary with access to the initial ADT state s^, 
the initial data structure state s^, the current ADT state 
{1,3,6}, and the current data structure state which is either 
<<3,1,6>> or <<6,1,3>>. 

By looking at the ADT states alone, the adversary cannot 
determine which sequence of ADT operations was used to 
arrive at the current ADT state {1,3, 6}. This is because there 
are two paths and between the ADT states and 
{1, 3,6}. Moreover, the ADT states carry no information about 
the exact path used to transition from to {1,3,6}. Hence, 

^Both WHI and SHI are formalized in Section IV. 

^Many existing data structures have this property and are hence, not 
history independent. Common examples include the linked list, hash tables 
and B-Trees. In these data structures different insertion order of the same 
set of data elements (i.e., the same ADT state) results in different memory 
representations. 


the data alone gives the adversary no advantage in guessing 
which sequence of ADT operations was applied in the past. 

Now, by looking at the current data structure state, the 
adversary can clearly identify which sequence of machine 
programs was used to arrive at the current data structure state. 
The current data structure state is either << 3,1,6 >> or 
<< 6,1,3 >>. There is a unique path from initial data 
structure state to each of the states << 3,1,6 >> and 
<< 6,1,3 >>. Hence, by observing the current data structure 
state, the adversary can identity whether path pij or path 
p'jy was used to transition from state to the current data 
structure state. Identification of the path in the data structure 
state transition graph informs the adversary of the program 
sequence used. Knowledge of the program sequence used 
in-turn tells the adversary the sequence of ADT operations 
used. In conclusion, the data structure implementation gives 
the adversary an advantage in guessing the history of past 
execution, thereby breaking history independence. 

How can we achieve history independence? The two known 
ways to make data history independent: 

1) For SHI, make the ADT and the data structure state 
transition graphs isomorphic: 

Data structures with state transition graphs isomorphic 
to their ADT’s state transition graph are referred to as 
canonically represented data structures. We discuss the 
necessity of canonical representations for SHI in Section 
IV-D. SHI implies WHI. 

2) For WHI, make the data structure state transitions ran¬ 
domized: 

Randomization here refers to the selection of the data 
structure state representing the corresponding ADT state. 
To illustrate, consider the example graphs from Fig¬ 
ure 2. Both data structure states << 3,1,6 >> and 
<< 6,1,3 >> are valid memory representations of the 
ADT state {1,3,6}. For WHI, the choice between data 
structure states << 3,1,6 >> and << 6,1,3 >> to 
represent the ADT state {1,3,6} must be random. 

As shown in Figure 3, randomization translates to ad¬ 
dition of new paths in the data structure state transition 
graph to ensure the following: For any two ADT states sq 
and si, if there is a path in the ADT state transition graph 
between sq and si, then, there must be a path from all 
memory representations of ADT state sq to all memory 
representations of ADT state si in the data structure’s 
state transition graph. The choice of path in the data 
structure state transition graph between representations 
of ADT states sq and si is then made at random. 

From the adversary’s point of view randomization makes 
all memory representations of an ADT state equally likely 
to occur. Hence, observation of a specific representation 
gives the adversary no advantage in guessing the sequence 
of machine programs that led to the current data structure 
state. Since the adversary cannot identify the sequence of 
machine programs used, the adversary is also unable to 
identify the sequence of ADT operations that led to the 
current ADT state. 
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search(7)/7 y(search)(a(7))/p(7) 



Fig. 2. Example of non-isomorphism between ADT and data structure state transition graphs, (a) Partial state transition graph for sample hash table ADT. 
(b) Partial state transition graph for sample array-based hash table data structure implementation using linear probing. Number of hash table buckets is 3 and 
the hash function is h{key) = key % 3. 7 (insert), 7 (search) and 7 (delete) denote the machine programs implementing the ADT operations insert, search 
and delete, respectively. o(i)/t denotes that ADT operation o takes input i and produces output f. Similarly, 'y{o){a(i))//3{t) denotes that program 7 ( 0 ) 
takes input a(i) and produces output /3(f). 0 ( 7 ) and /3(f) are the machine representations of the ADT input i and ADT output f, respectively. Note that the 
vertices in figure (b) represent data structure states. In the RAM model these will be bit strings. However, to convey data semantics we denote the hash table 
array as << ao, ai, a 2 >>, where ao, ai, and a 2 are elements at buckets 0 , 1 and 2, respectively. Underscore denotes an empty bucket. Highlighted paths 
are referenced in Table I, and in Section III-D 


Y(search)(a(7))/p(7) 



Fig. 3. Using randomization to achieve history independence. The dotted 
lines indicate new transitions added to the hash table data structure state 
transition graph. Amongst all edges with the same starting node and the same 
label, the choice of edge for state transition is made at random. 

E. Memory Representations 

In the discussion of nonisomorphism and history independence 
above, we informally introduced memory representations for 
ADT states. We also showed that history independence comes 
into picture when an ADT state has multiple memory represen¬ 
tations. In short, the memory representation for an ADT state 
that is reachable from the initial ADT state via a sequence of 
ADT operations, is the machine state reachable from the initial 
data structure state via the corresponding program sequence. 
We formally define memory representations here and use them 
later in Section IV for the game-based definitions of history 
independence. 

Let 6 = { 01 , 02 , ■■■, On) be a sequence of ADT operations 
and / = {ii, i 2 ,..., i„) be a sequence of ADT inputs. We 
denote by 0(5, sq, I) the application of the ADT operation 
sequence S on ADT state sq. 

( so if|(5| = 0 

{s„,T„)\ok{sk-i,ik) ->■ (sfc,Tfc); 

1 < k < n otherwise 

If 6 is empty no state transition occurs and no outputs are 
produced. For nonempty sequence 6, Sn and t„ denote the 
ADT state and the ADT output, respectively, produced by the 
final operation in sequence S. 

To summarize, we denote by 0(5, Sqj (sn, Tn) that the 
ADT operation sequence 5 when applied to the ADT state sq 


with ADT input sequence /, results in the ADT state s„ and 
ADT output T„. 

Now, let 5^ = x(^) = ( 7 (oi)> 7 (o 2 ), ■•■, 7 (o«)) be a se¬ 
quence of machine programs corresponding to the ADT oper¬ 
ation sequence 5. 7 ( 0 ^) is the machine program implementing 
the ADT operation o^. Then, we denote by 0-^(5-^, s^,/) 
the application of program sequence 5-^ on a machine state 

if|5-Aa|=o 

\li.Ok){s-^_^,a{ik)) {s-^, P{Tk)y, 

1 < k < n otherwise 

Here, a{i) and /3 (t) denote the machine representations 
for an ADT input i and an ADT output t, respectively, 
and /3(r„) are the machine state and the machine output, 
respectively, produced by the final program in sequence 5-^. 

In summary, we denote by 0^(5^, s^, I) -5- 
that a program sequence 5^ when applied to a machine state 
with an ADT input sequence I, results in a machine state 
and a machine output t 7 *- 

Definition 4. Memory Representations 

The set of memory representations of an ADT state s, denoted 
by m(s), is the set of data structure states, defined as 

{ S^ ifs = 

m{s) = <^ I ^ 

I 1 < A: < n otherwise 

where, is the initial data structure state; Ii, I 2 , ■■■, In 
are sequences of ADT inputs; 5i, 52,..., 5„ are ADT operation 
sequences, each of which when applied to the initial ADT 
state results in state s, that is 0{dk, s^, Ik) -5- {s,Tk); 
5^ = xi^k) denotes the program sequence corresponding to 
ADT operation sequence 6k; \Ik \ = |5fc|; 1 < fc < n. 

Here m is the mapping m : <S —> 2“^ , where S is the set 
of all ADT states, is the set of all data structure states, 
and 2 “^ denotes the power set of S^. 

1) Dealing With Infinite ADT State Space 

The set of machine states for the bounded RAM model is 
finite since there are finite number of available bits. Hence, a 


0^(6^, s^, I) = 
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data structure implementation on a bounded RAM model can 
only have a finite number of data structure states. The set of 
ADT states on the other hand can be infinite. For an ADT 
with infinite states, a data structure implementation will be 
unable to uniquely represent all the ADT states. The case of 
infinite ADT states is of particular importance for canonically 
represented data structures that require the state transition 
graphs of the ADT and of the data structure to be isomorphic, 
that is, each ADT state has a unique memory representation. 

We will look at canonical representations in detail within 
the context of history independence in Section IV-D. Here, 
we list two work-arounds to dealing with infinite ADT state 
space. 

1) Redefine the ADT, such that the number of ADT states 
is less than or equal to the number of machine states. 

2) Design each machine program implementing an ADT 
operation, such that the program produces a special output 
when an ADT state cannot be represented using the 
available machine bits. For example, an out-of-memory 
error. 

IV. History Independence 

Now that we are equipped with the necessary concepts (ADT, 
RAM machine model, data structure, and memory representa¬ 
tions), we proceed to formalize history independence. We give 
new game-based definitions for both WHI and SHI (Sections 
IV-A and IV-A). The new definitions are equivalent to exist¬ 
ing proposals [2], [6] but more appropriate for the security 
community since they follow the game-based construction of 
semantic security. Further, our new definitions naturally extend 
to accommodate other notions of history independence beyond 
WHI and SHI. 

A. Weak History Independence (WHI) 

WHI was introduced for scenarios wherein an adversary 
observes only the current data structure state. For example, 
as in the case of a stolen laptop. 

Informally, a data structure is said to be weakly history 
independent if for any two sequences of ADT operations Si 
and S 2 , that take the ADT from initialization to a state 5 , 
observation of any memory representation of state s gives the 
adversary no advantage in guessing whether sequence Si or 
S 2 was used to get to s. 

We define weak history independence (WHI) by the follow¬ 
ing game: 


Let A = (5,5^,0,r, be an ADT, A4 = 

5 ^, 7^-^, r-^, T'-^) be a bounded RAM machine model, and 
X> = (a:,/3,7, be a data structure implementing A in A4, as per 
definitions 1, 2 and 3, respectively. 

1) A probabilistic polynomial time-bounded adversary selects the 
following: An ADT state s; two sequences of ADT operations (5o 
and 5i; and two sequences of ADT inputs Iq and Ii; such that 
0(^05 ^(pAo) (-5, t) and 0((5i( 5 , r). Both (5i and 
S 2 take the ADT from the initial state to state 5 producing the 
same output r. 

2) The adversary sends s, (5o, (5i, 7o and Ii to the challenger. 

3) The challenger flips a fair coin c G {0,1} and computes 

where = x(<5c) and 


— ^{r). That is, the challenger applies the program sequence 
5-^ corresponding to the ADT operation sequence 5c to the 
data structure initialization state resulting in a memory 

representation s-^ of ADT state s and a machine output r-^. 

4) The challenger sends the memory representation 5-^ to the 
adversary. 

5) The adversary outputs c' € {0,1}. 

The adversary wins the game if c' = c. 

X> is said to be weakly history independent if the advantage of 
the adversary for winning the game, defined as |Pr[c = c] — 1/21 
is negligible (where “negligible” is defined over any implementation- 
specific security parameters of the programs in 6 


Since WHI permits the adversary to make a single obser¬ 
vation, the adversary is allowed to choose the end state only 
in step 1. The starting state for the chosen ADT operation 
sequences is always the initial ADT state s^. Recall from the 
data structure definition (Section III-C) that the initial ADT 
state has a fixed memory representation, which is the initial 
data structure state . Hence, in step 3, the challenger applies 
the adversary-selected sequence to the memory representation 
of S 0 . 

If the adversary is able to identify the ADT operation 
sequence chosen by the challenger in step 3, then the adversary 
wins the game. Winning the game implies the adversary was 
able to determine the operation sequence that led to the current 
ADT state by observing the state’s memory representation, 
thereby breaking WHI. 

B. Strong History Independence (SHI) 

Unlike WHI, SHI is applicable when an adversary can observe 
multiple memory representations throughout a sequence of 
operations For example, as in case of an insider who can obtain 
a periodic memory dump. SHI requires that the adversary must 
not gain any additional information about the sequence of 
operations between any two adjacent observations than what 
is inherently available from the corresponding ADT states. 

Informally, a data structure is said to be strongly history 
independent if for any two sequences of ADT operations (5i 
and 82 , that take the ADT from a state si to a state S 2 , 
observations of any memory representations of states si and §2 
give the adversary no advantage in guessing whether sequence 
(5i 01 62 was used to go from si to S2- 

We define strong history independence (SHI) by the follow¬ 
ing game: 


Let A = (iS, (P, r,'!') be an ADT, Ai = 
(cS^, s-^, "P-^, r-^, 2 bounded RAM machine model, and 

I? = (a, /3, 7 , Sj^) be a data structure implementing A in Ai, as per 
definitions 1, 2 and 3 respectively. 

1) A probabilistic polynomial time-bounded adversary selects the 
following. 

. Two ADT states si and S 2 ; two sequences of ADT operations 
So and 5i; and two sequences of ADT inputs Iq and 7i; such 
that 0(5oi'SlVo) — >■ (•S 2 ,'r) and 0(5i,si,7i) — >■ ( s 2 , t ). 


®For example PRNG seeds when using randomization, or keys when using 
encryption. 
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Both and <52 take the ADT from state 5i to state 52 
producing the same output r. 

• A memory representation 5 ^ of ADT state 5i. 

2) The adversary sends 5i, 5 ^, (5o, 5i, Iq and Ii to the challenger. 

3) The challenger flips a fair coin c G {0,1} and computes 

^ where 6^ = xi^c) and 

= /?(r). That is, the challenger applies the program sequence 
corresponding to the ADT operation sequence Sc to the data 
structure state 5 ^, resulting in a memory representation of 
state S 2 and a machine output r-^. 

4) The challenger sends the memory representation to the 

adversary. 

5) The adversary outputs c' £ {0,1}. 

The adversary wins the game if c' = c. 

T> is said to be strongly history independent if the advantage of 
the adversary for winning the game, deflned as |Pr[c = c] — 1/2| 
is negligible (where “negligible” is defined over any implementation- 
specific security parameters of the programs in 


Winning the game means that the adversary was able to 
determine the operation sequence that took the ADT from state 
5 i to state S 2 , thereby breaking SHI. 

SHI implies WHL If the ADT state si chosen by the 
adversary in step 1 is the initial ADT state 5^, then the SHI 
game reduces to the WHI game of Section IV-A. 

C. Equivalence to Existing Definitions 

WHI and SHI were first introduced by Naor et al. [11]. Later, 
Hartline et al. [6] introduced new definitions for WHI and SHI. 
However, Hartline et al. showed that their definitions although 
less complex are equivalent to the ones proposed by Naor 
et al. Our game-based definitions of WHI and SHI (Sections 
IV-A and IV-B) differ slightly from the definitions by Hartline 
et al. Specifically, Hartline et al. assume a computationally 
unbounded adversary. We address history independence in the 
presence of computationally bounded adversaries to be more 
in-line with reality. Further, new definitions were necessary to 
overcome impreciseness in existing definitions and to develop 
a framework for new history independence notions beyond 
WHI and SHI. We detail in the following. 

Hartline et al. defined weak history independence as follows. 

Definition 5. Weak History Independence (WHI) 

A data structure implementation is weakly history independent 
if, for any two sequences of operations X and Y that take the 
data structure from initialization to state A, the distribution 
over memory after X is performed is identical to the distribu¬ 
tion after Y That is: 

{(j) A) A (</) A) =^> V a € A Pr[</> —^ a] = Pr[(() 



In the above definition, f —s- B denotes that a operation 
sequence X when applied to the initial state </>, results in 
state A. The notation a € A means than a is a memory 
representation of state A. Pr [</> a] denotes the probability 
that a sequence X when applied to initial state results in 
representation a. 

Reconciling terminology 


Hartline et al. do not formalize the concepts of data struc¬ 
ture, data structure state and memory representations. A data 
structure’s state is referred to as the data structure’s content. 
Memory representation of a data structure state is the physical 
contents of memory that represent that state. We note that Naor 
et al. also used the same terminology in their definitions. 

The WHI definition by Hartline et al. is imprecise in the 
following. 

• Operation inputs and outputs are not considered. 

• The same operation sequences are considered applicable 
to both data structure states and to memory represen¬ 
tations. The mechanisms for the applicability are not 
specified. 

• The connection between a data structure’s state and the 
state’s memory representations is not precisely specified. 

Following Golovin et al. [1] we use the ADT concept to 
model logical states (or content) and define a data structure 
as an ADT’s implementation (Sections III-A - III-C). A 
data structure state is therefore the memory representation of 
an ADT state. Separating ADT and data structure concepts 
enables us to precisely define memory representations (Section 

III- E) for various machine models; understand history inde¬ 
pendence from the perspective of state transition graphs; and 
to build a framework for defining new history independence 
notions other than SHI and WHI (Section V). 

To summarize the differences in terminology, what Hartline 
et al. refer to as data structure state in definition 6 is an 
ADT state in our model. Further, we refer to a memory 
representation in definition 6 as a data structure state. 

For WHI, Hartline et al. require a data structure implemen¬ 
tation to satisfy the following: 

{(j) A) A (</) A) =j> Va € A, Pr [f a] = 

Pr [f a] 

Our game-based definition of WHI poses the following 
slightly relaxed requirement: 

{(j) A) A {(j) A) =j> Va G A, |Pr[(/) 
a] — Pr\f) -7^ a] | is negligible 

We will show that the game-based WHI definition (Section 

IV- A) is equivalent to statement IV-C, that is, a data structure 
preserves WHI only if statement IV-C is true. However, before 
we show the equivalence we point out the necessity for the 
difference between conditions IV-C and IV-C. 

As discussed in Section III-D, there are two known ways to 
achieve history independence. The first way is to make the 
ADT and data structure state transition graphs isomorphic. 
The second way is to make the data structure state transition 
graph randomized. The requirement for identical memory 
distributions as per statement IV-C rules out the use of ran¬ 
domization to achieve history independence^. A randomized 
data structure implementation will rely on pseudo random 
generators. The security of pseudo random generators relies 
on computational indistinguishability. Therefore, the relaxed 

^The use of randomization to achieve weak history independence is 
discussed in Section IV-E. 




requirement of negligibility introduced in statement IV-C is in 
fact not a limitation, but rather a reconciliation of the definition 
by Hartline et al. with reality where we have computationally 
bounded adversaries. 

Although Naor et al. proposed a WHI definition that re¬ 
quires identical distributions, they also used randomization to 
design a history independent data structure. 

Equivalence of WHI definitions 

We now show that our gamed-based WHI definition (Sec¬ 
tion IV-A) is equivalent to a WHI definition based on statement 
IV-C. 

We rewrite statement IV-C for consistent notations as fol¬ 
lows. 

(s 0 s) A (s 0 s) Vs-^ G s, \Pr[s-^ 

r Ad 

gATj _ gATj I jg negligible 

Here, 6 o and are two ADT operation sequences that take 
the ADT from initial state to state s. and are the 
initial ADT and the initial data structure states, respectively. 

and are the machine programs corresponding to ADT 
operation sequences do and di, respectively. 

History independence only considers cases where the con¬ 
dition (s 0 s) A (s 0 s) is true, that is, both sequences 
do and di take the ADT to the same end state s. Otherwise, 
the ADT states themselves reveal history. 

We therefore have two cases to consider 

Case 1: The distributions are computationally distinguish¬ 
able, that is, 

rAd rAd 

G s such that \Pr[s-^ — Pr[s-^ 

s-^] I is non — negligible. 

Now consider the following adversarial strategy. Given a 
data structure state in step 4 of the WHI game, the 
adversary outputs c such that d^ has a higher probability 
of producing s-^. For such an adversarial strategy |Pr[c' = 
c] — is non-negligible for some s-^. Therefore, the data 
structure implementation does not preserve WHI. 

Case 2: The distributions are computationally indistinguish¬ 
able, that is, 

G s, \Pr[s^ ^ - Pr[s^ ^ 

s^] I is negligible 

In this case, from a computationally bounded adversary’s 
perspective, the representation received in step 4 of the 
WHI game is equally likely to have been produced by either 
C or 6 ^. Hence, observation of a data structure state gives 
the adversary a negligible advantage in guessing c. The data 
structure implementation therefore preserves WHI. 

Equivalence of SHI definitions 

For strong history independence Hartline et al. proposed the 
following definition. 

Definition 6. Strong History Independence (SHI) 

A data structure implementation is strongly history indepen¬ 
dent if, for any two (possibly empty) sequences of operations 


X and Y that take a data structure in state A to state B, the 
distribution over representations of B after X is performed on 
a representation a is identical to the distribution after Y is 
performed on a. That is: 

(A A B) A (A A B) V a G A,V b G B, Pr[a A 
b] = Pr[a b] 

In the above definition, A — B denotes that a operation 
sequence X when applied to state A, results in state B. The 
notation a G A means than a is a memory representation 
of state A. Pr[a b] denotes the probability that a 
sequence X when applied to memory representation a, results 
in representation b. 

Similar to the case for WHI, our game-based SHI definition 
(Section IV-B) differs from the above definition only by 
relaxing the requirement for identical distributions. That is, 
for SHI, we require the following: 

(A ^ B) A (A A B) a G A,Vb G B, |Pr[a A 
bj - Pr [a — :■ bj | is negligible 

The equivalence of SHI definitions follows similarly to the 
case of WHI. 

Summary of Differences 

The main differences between our definitions and the the 
definitions by Hartline et al. are the following 

• The definitions by Hartline et al. are imprecise about the 
concepts of data structures, states, and memory represen¬ 
tations. We precisely formalize all of these concepts. 

• Hartline et al. do not consider the case of computationally 
bounded adversaries. We permit computationally bounded 
adversaries and thus have the negligibility definition 
instead of equality for memory distributions. 

D. Canonical Representations 

Canonically (or uniquely) represented data structures have the 
property that each ADT state has a unique memory represen¬ 
tation. Unique representation implies that the ADT and data 
structure state transition graphs are isomorphic^. Canonically 
represented data structures give very strong guarantees for 
history independence and in many cases are the only way to 
achieve history independence. 

We first define canonically represented data structures and 
then discuss several important results pertaining to canonical 
representations and history independence. 

We also summarize (Table II) the scenarios where canonical 
representations are necessary for history independence across 
all combinations of types of programs, secrecy of random bits, 
adversarial computational ability, and the desired notion of 
history independence. 

Definition 7. Canonically represented data structure 
A data structure T> implementing an ADT A on a bounded 
RAM machine model A4 is canonically represented if each 
ADT state has a unique memory representation, that is, the 
mapping m : S —>■ 2^ is injective and |7n(5)| = 1, where S 

^Isomorphism is discussed in Section III-D. 
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TABLE II 

Identification of scenarios where canonical representations are necessary for history independence. N/A = not applicable. 


Programs of A4 

Random bits 
hidden from 
adversary 

Adversary 

computationally 

bounded 

History 

Independence 

desired 

Canonical 

representations 

needed? 

Randomized 

Yes 

Yes 

WHI 

No 

N/A 

N/A 

No 

N/A 

Yes 

N/A 

N/A 

N/A 

SHI 

Yes 

Deterministic 

N/A 

N/A 

N/A 

Yes 


is the set of all ADT states, is the set of all data structure 
states, and m{s) denotes the set of memory representations of 
an ADT state s G S as per definition 4. 

1) ADTs with infinite states 

The case of infinite ADT states is of particular importance 
for canonically represented data structure implementations on 
a bounded RAM machine model. Since the bounded RAM 
machine model has a finite number of available bits, the 
machine state space is not large enough to provide a unique 
representation for each ADT state when the ADT state space 
is infinite. Impossibility of unique representations clearly sug¬ 
gests that canonical representations for infinite state set ADTs 
are not possible in practice since machines with infinite state 
space do not exists in reality. This straight-forwardly leads to 
the following theorem. 

Theorem 1. Canonically represented data structure imple¬ 
mentations for ADTs with infinite states are impossible in 
practice. 

However, prior work [11], [2], [1] has claimed designs for 
canonically represented data structures for the RAM model 
in direct contradiction to Theorem 1. The contradiction arises 
from the fact that prior work has implicitly considered ADTs 
with finite state space. Specifically, the ADTs considered have 
have fewer states than the the total number of machine states. 

2) Cannonical Representations and SHI 

Since history independence was first proposed [11], it has been 
known that canonically represented data structures support 
SHI. An interesting question posed in this context was whether 
canonical representations are necessary to achieve SHI. The 
question about the necessity of canonical representations for 
SHI was answered by Hartline et al. Hartline et al. [6] showed 
that SHI cannot be achieved without canonical representations. 

Theorem 2. A data structure is strongly history independent 
iff it is canonically represented. 

Proof. The proof by Hartline et al. [6] builds on the case 
that if a data structure is not canonically represented, then 
an adversary can distinguish an empty sequence of operations 
from a nonempty sequence of operations. In the context of 
our game based definition for SHI, we provide an equivalent 
proof for the same. 

Consider an ADT A and a data structure V implementing 
A on a bounded RAM machine model M. Also assume that 
V is not canonically represented. Now, Let Q be an adversery 
and C be the challenger in our game. Q selects the following 


• Two ADT state si and S 2 . 

• Two sequence of operations Jq and (5i; and two sequences 

of ADT inputs /q and /i, such that O((5o,si,/o) ^ 
(s 2 ,r) and 0((5i,si,/i) (s 2 ,t). 

Let ai and 0:2 be two distinct memory representations for 
ADT state si. We show that, with this setup, the adverary Q 
can distinguish between an empty sequence of operations and 
a non empty sequence of operations. Consider that Q selects 
So to be an empty sequence of operation and (5i to be a non 
empty sequence of operations. In step 2 of the SHI game, the 
adversary sends si,i5o,(5i,/o and Ii to C. In step 3 of the SHI 
game, C flips a coin and applies either Sq or (5i to si and 
returns the memory representation of the output state to Q. 
There are two possible cases for step 3 - 

1) C selects Jq - there is no change in the ADT state and 
the corresponding memory representation since an empty 
sequence of operations does not cause state changes. 
Hence in step 4, C returns ai to Q. 

2) C selects (5i - the final ADT state reached after per¬ 
forming all the operations in (5i is si but the memory 
representation for si in this case may be either ai 
or a 2 . In step 4, if C returns q ;2 to Q, then Q can 
correctly predict with non-negligible probability that C 
has applied (5i on si to reach q; 2 . This breaks strong 
history independence for V. 

□ 

Canonical representations are not necessary for WHI 

In the absence of canonical representations, it has been shown 
that an adversary can distinguish an empty sequence of 
operations from a nonempty sequence of operations thereby 
breaking SHI [6]. If operation sequences are always assumed 
to be nonempty, canonical representations are not necessary 
[6]. We will define such a slightly relaxed notion of history in¬ 
dependence that permits only nonempty sequences in Section 
V-A. We show that WHI is preserved even for empty operation 
sequences in the absence of canonical representations. 

Consider the WHI game from Section IV-A. The case 
in which the adversary selects two empty ADT operation 
sequences in step 1 is trivial since empty sequences cause no 
state transitions and hence there is no history to be revealed. 

Now, consider the case when the adversary selects an empty 
sequence S^ and a nonempty sequence of ADT operations. 
Both S^ and (5i are required to take the ADT from the initial 
state to the same end state. Since the empty sequence S^ causes 
no state transitions, end state for both sequences S^ and (5i will 
be the initial ADT state. 
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Then, in step 3, the challenger chooses either or (5i and 
sends the resulting memory representation to the adversary. 
Since the end state for the two operation sequences is the initial 
ADT state, the memory representation sent to the adversary 
in step 4 will be the data structure initialization state. From 
the data structure definition (Section III-C), we know that the 
initial ADT state has a corresponding fixed unique memory 
representation. Hence, irrespective of the nonempty sequence 
that the adversary selects in step 1, the adversary receives the 
initial ADT state’s memory representation in step 4. Since 
the adversary receives the same representation each time, the 
adversary gains no advantage in guessing whether or i5i 
was chosen by the challenger in step 3. 

ADT states other than the initial ADT state can have 
multiple memory representations. Multiple representations for 
ADT states does not break WHI as long it is ensured that from 
the adversary’s perspective, all representations of the current 
ADT state are equally likely to be observed. Randomization 
achieves equal likelyhood for all representations of an ADT 
state (Section IV-E). In Section III-D we covered the use of 
randomization for WHI from the perspective of state transition 
graphs. Later, in Section IV-E we will show how to achieve 
WHI by randomization in practice. , which can be used for 
WHI. Eor deterministic machine programs (also described in 
Section IV-E) WHI too requires canonical representations. 

3) Canonical representations and adversary models 

Canonically represented data structures are history indepen¬ 
dent in the strongest sense, secure even against a compu¬ 
tationally unbounded adversary [1]. Eor a computationally 
unbounded adversary, canonical representations are also nec¬ 
essary for WHI. 

E. Randomization and HI 

In Section III-D, we introduced the use of randomization for 
WHI from the point of view of state transition graphs. 

In practice, randomization is achieved using the machine 
programs implementing the ADT operations. An ADT op¬ 
eration o takes the ADT from a state si to a state S 2 - A 
machine program implementing o takes the data structure from 
a memory representation of state si to a memory represen¬ 
tation of state S 2 - Since each ADT state can have several 
memory representations (Section III-D), the program has a 
choice amongst all representations of state S 2 and picks one 
representation as the result of a transition. Starting from a fixed 
memory representation of si, and a fixed input, if the program 
takes the data structure to a fixed resulting representation of S 2 
on each execution, then the program is said to be deterministic. 
If on each execution the resulting representation is chosen 
uniformly at random from all possible representations of state 
S 2 , then the program is said to be randomized. 

To illustrate, consider an ADT operation o and a machine 
program p implementing o. Let o(si,i) —>■ (s 2 ,t) denote the 
transition from ADT state si to ADT state $2 using an ADT 
input i and producing an ADT output r. Also, let m(si) 
and m(s 2 ) denote the set of memory representations of states 
Si and S 2 , respectively. Then, for history independence, the 
following must hold for program p: 


Pr[p{sf^,a{i)) (s^,/3(t))] = V sf^ G m(si) 

and V s^ G to(s 2 ). 

Here, a{i) and /3(t) are the machine representations of ADT 
input i and ADT output r, respectively. 

Randomization here refers to the selection of memory 
representations for ADT states and not to program outputs. 
A program’s output is the machine representation of the 
corresponding ADT operation’s output. 

If randomization is used for history independence, then 
random choices made by the machine programs must be 
hidden from the adversary. If the adversary has knowledge 
of the random bits, then from the adversary’s point of view 
the machine programs are deterministic. Data structures with 
deterministic machine programs require canonical representa¬ 
tions. 

V. Generalizing History Independence 

SHI is a very strong notion of history independence requiring 
canonical representations [1], [6]. Canonically represented 
data structures are not efficient [12]. Eor heap and queue data 
structures Buchbinder et al. [12] show that certain operations 
that require logarithmic time under WHI take linear time under 
SHI. Hence, it is worth to question the need for canonical 
representations for history independence. Many scenarios may 
not require such a strong notion rendering SHI data structures 
with canonical representations inefficient. 

Some scenarios that can be efficiently realized by new 
history independence notions: 

• Hiding evidence of specific operations only. Eor example, 
hiding only the fact that a specific data item has been 
deleted in the past, as required by regulations [7] 

• A most recently used (MRU) caching or a journaling 
system by definition reveals the last k operations. Hence, 
journaling and caching require a new notion of history 
independence, wherein no history is revealed other than 
the last k operations [11]. 

• Revealing only the number of times each operation is 
performed [1]. Eor example, in a file-sharing application 
disclosing file-access counts may be permissible, but not 
the access order. 

A straight-forward way to define new notions of history 
independence is to provide a new game-based definition for 
each scenario. However, defining distinct scenario-specific 
games can quickly become a tedious process. Instead, we 
introduce a definitional framework that can accommodate a 
broad spectrum of history independence notions. We term the 
new framework as A history independence (AHI), where A 
is the parameter determining the history independence flavor. 
As we shall see, AHI also captures both WHI and SHI. In 
addition, AHI helps to reason about the history revealed or 
concealed by existing data structures which were designed 
without history independence in mind. 

A. A History Independence (AHI) 

The WHI and SHI games (Sections IV-A and IV-B respec¬ 
tively) are defined over a subset of ADT operation sequences. 
Eor WHI, the adversary is permitted to select sequences that 
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take the ADT from initialization to the same end state. For 
SHI, the permitted sequences are ones that take the ADT from 
the same starting state to the same ending state. The selection 
is made by the adversary in step 1 of both the WHI and SHI 
games. Hence, the initial selection permitted to the adversary 
determines the history that is desired to be revealed or hidden. 
By generalizing the selection step, we can accommodate a 
broad spectrum of history independence notions. We achieve 
the generalization in AHl, defined by the following game: 


Let A = (<S, 0, r,'Ll be an ADT, Ai = 

'P'^, r-^,be a bounded RAM machine model, and 
V = {a, j3, 7 , be a data structure implementing A in Ai, as per 
definitions 1, 2 and 3, respectively. Also, let be the set of all ADT 
operation sequences, T be the set of all ADT input sequences, and A 
be a function A:iSXiSxt^x(^xTxT—>-{0 ,1}. 

1) A probabilistic polynomial time-bounded adversary selects the 
following. 

• Two ADT states si and S 2 ', two sequences of ADT operations 
5q and (5i; and two sequences of ADT inputs Iq and 7i; such 
that A(si, S 2 , So, 5i, Iq, Ii) = 1. 

• A memory representation of ADT state si. 

2) The adversary sends si, Sg, 5i, Ig and /i to the challenger. 

3) The challenger flips a fair coin c G {0,1} and computes 

0-^(6^, S^,lc) where S^ = x(Sc). That is, 

the challenger applies the program sequence S^ corresponding 
to the ADT operation sequence Sc to the data structure state 
resulting in a memory representation and a machine output 

4) The challenger sends the memory representation s-^ and the 
machine output r-^ to the adversary. 

5) The adversary outputs c' € {0,1}. 

The adversary wins the game if c' = c. 

I> is said to be <5 history independent if the advantage of the adversary 
for winning the game, defined as |Pr[c = c] — 1/2| is negligible 
(where “negligible” is defined over any implementation-specific security 
parameters of the programs in 


Function A determines the pairs of ADT states, ADT oper¬ 
ation sequences, and ADT input sequences that the adversary 
is permitted to select in step 1 of the AHI game. For the 
adversary-selected ADT states, operation sequences, and input 
sequences, the AHI game can be played and the data structure 
implementation is required to ensure that the advantage of the 
adversary is negligible. Thus, for a given ADT, A defines two 
sets, 

Ha = {(si,S2,^07-fo)^i) I ^{si,S2,5i,52,Io,Ii) = 1}, 

and 

Ha = {(si, S 2 ,(5o7 ^ 1;^0 7-fi) I A(si,S 2 ,(5i, ^2 7 ^0 7 -fi) = 0}- 

For all tuples in Ha, history independence is preserved, that is, 
neither the ADT nor the data structure implementation reveals 
the operation sequence selected by the challenger in step 3. 
For all tuples in Ha, history independence is not required 
to be preserved since the ADT itself reveals the sequence of 
operations used. 

A careful choice of A allows us to precisely define both 
SHI and WHI, and a broad spectrum of new history inde¬ 
pendence notions. In the following, we illustrate the use of 
AHI framework to define some familiar history independence 
notions and a few previously unconsidered notions of history 


independence. 


1) Strong History Independence (SHI) 

We discussed SHI in Section IV-B. Here, we define the 
function A for SHI. 

A(si, S 2 , (5o, 5i ,/07 fi) = 

f 1 if O((5o,si,7o) (s 2 , r) and 0((5i,si,/i)—>■ (s 2 ,'r) 

1 0 otherwise 

For SHI, the adversary’s advantage in the AHI game must 
be negligible when in step 1, the adversary selects any two 
ADT operation sequences that take the ADT from a state si 
to a state S 2 producing the same ADT output r. 


2) Weak History Independence (WHI) 

Refer to Section IV-B for discussion on WHI, which requires 
the following definition of A. 

A{si, S 2 , So, Si, Ig, Ii) = 

{ 1 if Si = s,p and O(do, si, Jo) ^ (s 2 , r) and 
0((5l,Sl,/l) (S2,t) 

0 otherwise 

Since WHI permits the adversary to observe a single data 
structure state, the adversary chooses only the end state S 2 in 
step 1 of the AHI game. The starting state on which sequences 
Jo and Ji are applied is the initial ADT state s^. 


3) Null history independence ((/)HI) 

Under null history independence, a data structure conceals no 
history except for the trivial case when the ADT operation 
sequences and ADT input sequences selected by the adversary 
in the AHI game are identical. Example of a data structure 
with 0HI is an append-only log. We can reflect (/)HI using the 
following. 


A(si, S2, J07 J17 fo7 7 i) = 


1 if 0(Jo, si, Jo) ^ (s 2 , t) and 
0((5i,si,7i) ->■ (s2,r) 
and Ji = S 2 and 7i = I 2 
0 otherwise 


4) SHI* 

The necessity of canonical representations for SHI was proven 
by Hartline et al. [6]. The proof by Hartline et al. [6] builds 
on the case that if a data structure is not canonically repre¬ 
sented, then an adversary can distinguish an empty sequence 
of operations from a nonempty sequence. Hartline et al. 
[6] then proposed SHI*, which is defined over nonempty 
ADT operation sequences. SHI* data structures were initially 
expected to more efficient than data structures providing SHI. 
However, Hartline et al. [6] found that SHI* still poses very 
strict requirements on a data structure and may not differ from 
SHI in asymptotic complexity. 

Here, we give the A function for SHI*. 


A(si, S 2 , Jo, Ji, 7o, 7i) = 


1 if 0(Jo, si, 7o) ^ (s 2 , t) and 
0(Ji,si,7i) —>■ (52,";") and 
I Jo I >0 and |Ji| >0 
0 otherwise 


SHI* closely resembles SHI except that the operations 
sequences Jq and <^1 must be nonempty. 
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5) Reveal last k operations (Most Recently Used Cache, 
Journal) 

System features such as caching and journaling by definition 
reveal the last k operations performed from the ADT state 
itself. Thus, for caching and journaling, we need to define a 
A function, such that no additional historical information is 
leaked from the memory representations other than the last k 
operations. We define the new notion as follows. 

Let denote the operation in the sequence S. Also, let 
5[i,j] denote a subsequence of 6 , {^[z],... with z < j. 

A(si, S2, ^1, fo, fi) = 

1 if O((5o, si, fo) ^ (s 2 , r) and 
0((5i,si,/i) —>■ (s 2 ,r) and 

|(5o| > and |(5i| > k and 
So[\So\-k,\So\]=6i[\Si\-k,\Si\] 

0 otherwise 

The adversary is permitted to choose two sequences (5o and 
(5i, such the that last k operations in Jq and (5i are the same. 
Other than the last k operations, sequences Jq and may 
differ. Yet, the adversary should be unable to identify the 
sequence chosen in step 3. 

6) Operation-Agnostic History Independence (OAHI) 

Consider a secure deletion application that wishes to destroy 
any evidence of a delete operation performed in the past. That 
is, an adversary (by observing the memory representations) 
should be unable to detect whether a delete operation was 
performed or not other than guessing. In general, any particular 
operation may require to be concealed, not just deletes, can be 
extended to any ADT operation (not just delete) We introduce 
a new notion of history independence that conceals specific 
ADT operations. The new notion is referred to as operation- 
agnostic history independence (OAHI). A data structure that 
is A history independent given the following A function 
guarantees operation-agnostic history independence for an 
ADT operation o. 

A(si, S 2 , (5o, 5i, Jo,/i) = 

{ 1 if O((5o, si, lo) —>■ (•S 2 , t) and 0(5i, si, 7i) ^ (s 2 , r) and 
o € So and o ^ S\ 

0 otherwise 

In OAHI, neither the presence of operation o in Sq, nor 
the absence of o in gives the adversary any advantage in 
guessing the sequence chosen by the challenger in step 3 of 
the AHI game. 

B. Measuring History Independence 

We have seen that new notions of history independence can be 
easily derived from A history independence by defining the 
appropriate A function. In this section, we present an intuitive 
way of comparing A functions on the basis of the history they 
require to be concealed or preserved. 

For a given A function we defined the set TTa (Section 
V-A) that represents all combinations of ADT states, operation 
sequences, and ADT input sequences for which the adversary’s 
advantage is negligible in the A history independence game. 
That is, for all members of TTa, history independence is 
preserved. One insight is to use the cardinality of TTa as a 
measure of history independence. 


Recall from Section III-C that an ADT can have several data 
structure implementations. Let T) and T)' be two implemen¬ 
tations of an ADT A, such that is A history independent 
and T)' is A' history independent for two functions A and A'. 
Now, we say that T) is more history independent than T)' if 
i7A' C TTa. 

Note that |i7A| > I^^A'I alone does not imply that V is 
more history independent than V since an application may be 
more sensitive to the history preserved by D' than the history 
preserved by D. Only in the case where TFa' C TFa can we 
consider to be a more history independent implementation 
than V. 


C. Deriving History Independence 

In order to provide a history independent implementation for 
an ADT, we first require the A function to be precisely defined. 
Then, a history independent data structure can be designed that 
satisfies the A function. Satisfying a A function means that 
the adversary’s advantage is always negligible in the A history 
independence game. In effect, so far we have approached 
history independence as a define-then-design process. 

However, data structures have been in use for a long time 
and most data structures have been designed for efficiency or 
functionality with no history independence in mind. A natural 
question then arises - are there any meaningful® A functions 
satisfied by existing data structures?. 

A data structure can be A history independent for several 
A functions. For example, a data structure that satisfies SHI, 
also satisfies WHI, OAHI, and OIAHI. Hence, for a given data 
structure V finding a A function may not be a particularly 
difficult task. It may be more useful instead to determine an 
uncontained A function for V. We define an uncontained A 
function for a data structure as follows. 


Definition 8. Uncontained A function 

A A function for a data structure V is uncontained if T) is A 
history independent and $ A', such that T) is also A' history 
independent and C TTa', where 
Ha = {(si, S2, So, Ji, Io,Ii) I 
A(si, 52, ^0: ^0) ^l) = 1}/ 

Ha' = {{sfs'^,5'^,S[,IiI'^ I A'{sfs'^,S'^,S[,I'^,I[) = 1}; 
si, S 2 , Si, and s '2 are ADT states; So, ()i, (Jq, and are ADT 
operation sequences; and /q, Ii, Iq, and ![ are ADT input 
sequences. 


We can determine an uncontained A function for existing 
data structures on a case-by-case basis. An open question 
is whether there exists a general mechanism for deriving an 
uncontained A function for a given data structure. 


VI. From Theory To Practice 

A. Defining Machine States 

The RAM model of execution described in Section III-B 
consists of two components, the RAM and the CPU. Hence, 
the machine state for the RAM model includes bits from 
both the RAM and the CPU. In general, the machine state 

9A = 0 is satisfied by all data structures. Hence, we need to determine A 
functions that are more useful in practice. 
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for a system-wide machine model will comprise all system 
component states. A system-wide history independent imple¬ 
mentation has to then consider each individual component’s 
characteristics along the interaction between the components. 
Providing system-wide history independence is therefore chal¬ 
lenging. 

However, in practice an adversary may have access to only 
a subset of system components. In this case, for the purpose 
of history independence, the machine state can be defined 
over the adversary-accessible components only. For example, 
history independent data structures proposed in existing work 
(Section X) are designed with the RAM model in mind. How¬ 
ever, the machine states considered for history independence 
only include bits from the RAM and exclude the CPU. 

B. Building History Independent Systems 

Various techniques for designing history independent data 
structures for commonly used ADTs such as queues, stacks, 
and hash tables have been proposed [1]. Our focus on the other 
hand is designing systems with end-to-end history independent 
characteristics. The difference between history independent 
implementations for simple ADTs, such as stacks and queues 
versus a complete system, such as a database, or a file system 
is a matter of often exponentially increasing complexity. 
Fundamentally, any system can be modeled as an ADT and 
an history independent implementation can be sought for the 
system. 

We introduce a general recipe for building history indepen¬ 
dent systems as follows: 

1) Model the system as an ADT. For a specific example of 
file system as an ADT, refer to section VIII. 

2) Select a machine model for implementation. While defin¬ 
ing the machine state identify all machine components 
that the adversary has access to and define the machine 
state associated with the adversary-accessible compo¬ 
nents. 

3) Depending on the application scenario, fix a desired 
notion of history independence and the corresponding A 
function. 

4) Based on the definition of A, provide an implementation 
over the selected machine model. For complex systems, 
the implementation will likely require the most effort 
since the machine programs implementing the ADT op¬ 
erations must provably ensure that the advantage of the 
adversary is negligible in the AHI game. 

In section VIII, we follow the above recipe to design a 
history independent file system. 

VII. On A Philosophical Note 

At a very high level, the motivation for history independence 
can be stated as follows. 

For any logical state Sl, the physical state Sp representing 
Sl may reveal information about the history leading to Sp, 
that is otherwise not discernible via solely Sp. 

So far, we have considered the logical state to be the ADT 
state and the physical state to be the underlying machine state 


representing the ADT state, that is, the physical state is the set 
of all bits of the machine. Our selection of logical and physical 
states seems rather arbitrary. We do this specific selection due 
to our adversary model, which assumes that the adversary can 
interpret information at the level of bits. An adversary, that 
can for example, examine the electric charge in individual 
capacitors used to represent the bits will require a different 
choice of logical and physical state descriptions. A straight¬ 
forward choice would be to consider a bit as a logical state 
and the precise capacitor state as the physical state. 

The following interesting question arises from this discus¬ 
sion - is history independence only a matter of perspective!. 
The short answer is yes, history independence is a matter of 
perspective. There is no universal history independence. 

To clarify, consider the universe as a whole from the 
viewpoint of classical physics. Under the classical viewpoint, 
knowledge of current state of all objects in the universe enables 
determination of any past or future universal state since the 
laws of physics work both forwards and backwards in time. 
Hence, the past is never hidden and history independence 
is impossible. For example, using the currently observed 
movement of galaxies, the past states of the universe can be 
inferred up to the very initial moments of the big bang. 

Physical phenomena at the subatomic scale is explained 
by quantum physics. At the quantum level, the universe 
appears nondeterministic. Further, the uncertainty principle 
[14] restricts the ability to accurately measure the current 
state of a quantum system. Since the current state cannot 
be accurately known, it may seem the past states cannot be 
determined either and history independence can be achieved 
at the quantum level. 

However, even at the quantum level history independence 
is still a matter of perspective. The perspective is governed by 
the interpretation of quantum physics used. Under the many- 
worlds interpretation, the multiverse as a whole is determinis¬ 
tic [15]. The probabilistic nature at the quantum level is only 
our perception since our observations are limited to a single 
universe. A hypothetical all-powerful adversary that can view 
the entire multiverse would have a full view of the past and the 
future similar to the case of classical physics making history 
independence in the presence of such an adversary impossible. 

VIII. Practical SHI for File Systems 

We now apply our theoretical concepts and results towards 
practical history independent system designs. 

Our focus is designing systems with end-to-end history 
independent characteristics. The difference between history 
independent implementations for simple ADTs, such as stacks 
and queues [1] versus a complete system, such as a database, 
or a file system is a matter of often exponentially increasing 
complexity. Fundamentally, any system can be modeled as an 
ADT and a history independent implementation can be sought 
for the system. 

We introduce a general recipe for building history indepen¬ 
dent systems as follows: 

1) Model the system as an ADT. For a specific example of 
file system as an ADT, refer to section VIII-A. 
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2) Select a machine model for implementation. While defin¬ 
ing the machine state identify all machine components 
that the adversary has access to and define the machine 
state associated with the adversary-accessible compo¬ 
nents. 

3) Depending on the application scenario, fix a desired 
notion of history independence and the corresponding A 
function. 

4) Based on the definition of A, provide an implementation 
over the selected machine model. For complex systems, 
the implementation will likely require the most effort 
since the machine programs implementing the ADT op¬ 
erations must provably ensure that the advantage of the 
adversary is negligible in the AHI game. 

Using this recipe, we design, implement, and evaluate a 
history independent file system (HIFS) and a delete agnostic 
file system (DAFS). 

In Sections VIII-A - VIII-C, we describe HIFS, an SHI 
implementation for file systems. In Section IX, we introduce 
DAFS (delete agnostic file system). DAFS extends HIFS 
beyond SHI to implement new history independence notions. 
DAFS aims to be more efficient for scenarios in which canon¬ 
ical representations can be avoided. Further, DAFS extends 
functionality and resilience of the FS. 

A. HIFS Overview 

Existing file systems, such as Ext3 [16] are not history 
independent because they organize data on disk as a function 
of both files’ data and the sequence of file operations. The 
exact same set of files can be organized differently on disk 
depending on the sequence of file system operations that 
created the set. As a result, observations of data organization 
on disk can potentially reveal file system’s history. Moreover, 
file system meta-data also contains historical information, 
such as list of allocated blocks. Therefore, when observations 
of data organization are combined with file system meta¬ 
data, and with knowledge of application logic, significantly 
more historical information can be derived, for example, full 
recovery of deleted data. It is therefore imperative to hide file 
system history. 

Eile system history can be hidden by making file system 
implementations history independent. A straight-forward way 
to achieve this is to use existing history independent data 
structures to organize files’ data on disk. Current techniques 
to make history independent data structures persistent require 
the use of history independent hash tables [1]. The history 
independent hash tables [3] in turn use uniform hash functions. 
The use of uniform hash functions distributes files’ data on 
storage with no consideration to data locality. 

Modern filesystems exploit data locality for performance by 
storing logically related data at nearby physical locations 
on the storage device. Eor e.g., blocks of data belonging 
to the same file may be stored physically close to each 
other to reduce seek time on traditional storage devices with 
mechanical parts. This significantly reduces the latency for 
file access. Consequently, existing history independent data 
structures which do not preserve data locality cannot be used 
for practical filesystem design. 


In HIES, we overcome the challenge of providing history 
independence while preserving data locality. 

Model 

We assume an insider adversary with full access to the system 
disk. By analyzing data organization on disk, the adversary 
aims to derive file system’s history. We assume that the 
adversary can make multiple observations of disk contents. 
Recall from Section IV-D that thwarting such an adversary 
requires SHI with canonical representations. Hence, HIES 
targets canonical representations for file storage. 

Concepts 

1) File System ADT 

A file system organizes data as a set of files. We consider a 
file to consist of some meta-data and a bit string. That is, a 
file / = {mf,bf}, where irif is the file meta-data and bf G 
{0,1}*. We define a file system ADT using the file type. Refer 
to Section III-C for a discussion on ADTs and types. 

A file system is an ADT, that is, a pentuple {S, s^, O, F, T'), 
where 

.5 = 2^, is the set of states. Here T is the set of all files. 

• C iS is the initial state. 

. r = NU{0,l}*U(NxNxN)U(NxNxNx{0,1}*) is 
the set of all possible inputs to the filesystem operations. 
In other words, the set of all possible inputs is composed 
of ; All possible inputs to the filesystem close operation 
U All possible inputs to the filesystem open operation U 
All possible inputs to the filesystem read operation U All 
possible inputs to the filesystem write operation. 

• T' = Z U ({0,1}* X Z) is the set of all possible outputs 
from the filesystem operations. In other words, the set 
of all possible outputs is composed of : All possible 
outputs of filesystem operations U All possible output 
for filesystem metadata. 

• The set of operations O = {open, read, write, delete, 
close}, such that 

- open : iS X {0,1}* -G S x Z. 

- read iiSxNxNxN—>iSx{0,1}* x Z. 

- write :iSxNxNxNx{0,1}* —> 5 x Z. 

- delete : <S x N —> 5 x Z. 

- close : 5 X N —>■ <S X Z. 

File systems including HIFS support several additional 
operations. We have included only a small subset of the 
operations here for brevity. 

2) RAMDisk Machine Model 

In Section III-B we introduced the RAM machine model. The 
RAM model consists of two components, a central processing 
unit (CPU), and a random access memory (RAM). However, 
a file system is generally used to store and manage data over 
a secondary storage device. Hence, we define the RAMDisk 
model which in addition to the CPU and memory also includes 
the storage disk. 

Definition 9. RAMDisk Machine Model 
A RAMDisk machine model A4 t> with m b-bit memory words, 
n b-bit CPU registers, and c k-bit disk blocks is a pentuple 
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(5,S0,7^,r,5'), where S = {o, l}K™+«)+C'fe ^ ^et of 

machine states, & S is the initial state, V is the set 
of all programs of A4 -d, T = {0,1}* is a set of inputs, 
'll = {0,1}* is a set of outputs; each program p € V is a 
function p : S x Tp ^ S x ifip, where Fp C F and ^ip C v]/. 

Aij) is initialized to state When a program p G V with 
input i S Fp is executed by the CPU when Aiv is in state 
si, Aiv outputs T G 'i’p and transitions to a state S 2 - This 
transition is denoted as p{si,i) —>■ is 2 ,T). 

According to our model (Section VIII-A), the adversary has 
access to the storage disk. For the purpose of history inde¬ 
pendence, we need to consider the machine states associated 
with the adversary-accessible components only. Hence, from 
this point onwards we refer to the storage device state as the 
machine state. Since the adversary does not access CPU and 
RAM components we permit the CPU and RAM states to 
reveal history. 

3) File System Implementation (Data Structure) 

The objectives of HIFS design are three-fold. 

1) For a given set of files, the organization of files’ data and 
files’ meta-data on disk must be the same independent of 
the sequence of file operations. That is, file system im¬ 
plementation must be canonically represented and thereby 
preserve SHI. 

2) Despite history independent storage, data locality must 
be preserved. 

3) The implementation must be easily customizable to suit 
a wide range of data locality scenarios. 

HIFS is a history independent implementation of the file 
system ADT from Section VIII-Al. That is, HIFS is a data 
structure T) = (a, /3,j, s^) obtained as follows. 

• For all n G N;,, a(n) G {0,1}^. Here, Nf, = {x\x G 
N and x < 2 ^}, b is the machine word length, and 
a{n) is the bit string representing n. For all ts G 
{ 0 , a(fs) = ts where ts represents the current 
disk state in the RAMDisk model. For all (ni,n 2 , ns) G 
Nb X Nb X Nb,a((ni,n2,n3)) = a{ni)\\a{n2)\\a{ns)- 
For all (ni,n 2 ,n 3 ,U) G Nf, x N& x Nb x {0, 
a{{ni,n2,ns,ts)) = a(ni)||Q;(n2)||Q;(n3)||U. 

• For all 2 ; G hb, a{z) G {0,1}^. Here, Zf, = {a;|a; G 
Z and X < 2^}, b is the machine word length, and a(z) 
is the bit string representing 2;. 

• 7 : O —> The programs that we provide for each 

file system operation are the key to achieving SHI. We 
discuss the HIFS programs in Section VIII-B. 

• The initial data structure state corresponding to the 
initial file system ADT state is obtained by initializing 
all file system meta-data. 

B. Architecture 

1) Overview 

A file system ADT state contains two pieces of information 
for each file - the file meta-data and the file data. HIFS 

'’’File system meta-data includes superblock, group descriptors, inode 
tables, and disk buckets map. Refer to [17] for detailed HIFS architecture. 
The loading of file system programs and memory management are done by 
the operating system. 



Fig. 4. HIFS disk layout. data blocks 0 - db„ 

supports SHI by providing unique memory representations for 
each ADT state. To ensure unique representations, we first 
select an existing SHI data structure implementation for a hash 
table ADT (Section VIII-B2). Then, we re-design the hash 
table implementation to endow it with data locality properties 
(Section VIII-B3). Finally, we use two instances of re-designed 
hash table implementation to store data on disk, one for files’ 
meta-data and the other for files’ data (Section VIII-B4). 

We refer the reader to [17] for detailed HIFS architecture. 
In the following we focus only on the key features that make 
HIFS history independent and locality preserving. 

2) History Independent Hash Table [3] 

The SHI data structure of choice is the history independent 
hash table from [3]. The hash table in [3] is based on the 
stable matching property of the Gale-Shapley Stable Marriage 
algorithm [18]. 

SHI Hash Table: [3] uses the stable matching property to 
construct a canonically represented SHI hash table . 

The construction in [3] ensures that for a given set of keys, 
the hash table data structure state is the same irrespective of 
the sequence of key insertions and deletions, thereby making 
the hash table data structure canonically represented". 

3) Key Insights 

The SHI hash table of [3] can be used as is to organize file’s 
meta-data and files’ data on disk. This will yield a SHI file 
system implementation. However, doing so does not preserve 
data locality, which is an important goal in HIFS design. 
Then a key observation in this context is the following. In 
the Stable Marriage algorithm each man in M can rank the 
n women in W in n! ways and vice-versa. Hence, several 
sets of preferences from keys to buckets and buckets to keys 
are possible, each resulting in a distinct hash table instance. 
Therefore, by changing the preference order of keys and 
buckets we can control the organization of keys within the 
hash table. 

To enable the re-ordering of preferences we re-write the 
algorithms of [3]. The re-write categorizes hash table opera¬ 
tions in two Procedure Sets, a generic set and a customizable 
set. The generic procedures implement the overall search, 
insert, and delete operations, and can be used unaltered for all 
scenarios. The customizable procedures determine the specific 

’’Refer to [3] for proof of canonical representation. 
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key and bucket preferences thereby governing data organi¬ 
zation, including canonical representations and data locality. 
This new procedure classification and rewrite enables HITS to 
realize different data locality scenarios for the same data set 
through modifications of the customizable procedure set only. 
The generic procedures and the overall hie system operations 
remain unchanged. We note that this customization is achieved 
while preserving SHI. 

Due to space constraints we refer the reader to [17] for 
complete listing of generic and customizable procedures for 
several data locality scenarios. In this paper we focus on the 
scenario of block group locality. Under block group locality, it 
is desired that blocks of the same hie are located close together 
on disk ideally within the same block group. 

4) File Storage 

File data is stored in blocks on disk. The blocks are grouped 
into hxed-size units. Each unit is termed as a disk bucket 
(Figure 4). Like Ext3 [16], HIES divides the disk into block 
groups. Each block group contains an inode table, a disk 
buckets map, and a set of disk buckets. Each entry within the 
disk buckets map has a one-one mapping to the corresponding 
disk bucket within the same block group. The entry in the disk 
bucket map contains meta-data about the corresponding disk 
bucket, such as whether the bucket is free or occupied. 

5) Achieving SHI With Data Locality 

Existing hie systems, such as Ext3 [16] maintain a list of 
allocated blocks within the hie inode, which renders the disk 
space allocation history dependent. HIES on the other hand 
does not rely on allocation lists. Instead, in HIES, locations 
of data blocks are determined using only the current operation 
parameters and do not depend on past operations. 

In HIES, the disk bucket maps from all block groups 
are collectively treated as a single SHI hash table. Then, to 
achieve canonical representations for hie system ADT states, 
the hie system operations are translated in to SHI hash table 
operations as follows: (a) Keys are derived from the full hie 
path, and from read or write offset parameters to hie system 
operations, (b) The hash table buckets are the disk buckets 
map entries, (c) Key preferences are set such that each key 
hrst prefers all buckets from one specihc block group in a 
hxed order. Then, buckets from the next adjacent block group 
and so on. This ensures that with high probability blocks of 
the same hie will be located within the same block group, (d) 
Buckets prefer keys with higher numerical values. 

The above translation realizes one data locality scenario 
referred to as block group locality. In [17] we demonstrate 
several other scenarios such as sequential hie storage and 
locality based on external parameters. 

C. Experiments 

A detailed evaluation of HIES for different application prohles 
and data locality scenarios is available in [17]. Here, we only 
list partial results (Figure 5) to give a sense of throughputs 
that can be achieved under SHI. 

We tested HIES on a hie system of size 100 GB with mean 
hie size 1 GB. The experiments were conducted for different 



Fig. 5. HIFS experimental throughputs. Load factor = space utilization. 

load factors denoted by The tests were conducted on 
Lx 100 hies. We used 4 KB disk blocks with 8 block groups 
and 5120 disk blocks per bucket. The typical inode size used 
was 281 bytes and lO size was 32 KB. 

The performance of HIFS for read operations is comparable 
to read throughputs of Ext3 for load factors up to 60%. For 
higher load factors the write operations sustain signihcant 
overheads. This is because as the load factor increases, the per 
write overhead to maintain canonical representations increases 
exponentially. The overhead is the relocation of existing hies’ 
data when a new hie is being written or when a hie is being 
resized. However, once the write operations achieve canonical 
representations with block group locality reads are efficient. 

IX. Delete Agnostic File System (DAFS): 
Journaling and DAHI 

The HIFS implementation (Sections VIITB - VIITC) supports 
SHI. As seen from the experimental results (Figure 5), for 
higher hie system load factors, write efficiency is low. This 
is because SHI strictly requires canonical representations. 
To ensure canonical representations, HIFS relocates data on 
each write operation. The amount of data re-located increases 
exponentially with the hie system load factor. Hence, the write 
throughputs are signihcantly lower for load factors greater than 
60%. 

Applications that do not require SHI can be made highly 
efficient using new targeted history independence notions. In 
Section V-A, using AHI we have dehned several new history 
independence notions that unlike SHI, do not require canonical 
representations. We have re-designed the hie system layer to 
support such new notions. Further, we have extended both 
functionality and resilience of the hie system. The new hie 
system is called Delete Agnostic File System (DAFS). 

Journaled History Independence (JHI). In the event of a 
system failure, it is imperative that the hie system state is 
not corrupted. To ensure this, hie systems typically employ a 
journal. File system operations are hrst recorded in the journal 
and then applied to the hie storage area. If a failure occurs 
while writing to the journal, the operations can be ignored on 
system recovery. On the other hand, if failure occurs while 
writing to hie storage, then on recovery the operations can be 
re-played from the journal. Thus each write request to the hie 

*^Load factor is the file system disk space utilization. 
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system causes two disk writes, one to the journal and one to 
file storage. 

DAFS Journaling. In DAFS, a separate region on disk is 
reserved for a journal in the form of a circular log. The 
journal contains information for a finite number of file system 
operations, say k operations. Operations are recorded in the 
journal in the order in which they are received by the file 
system. To restore consistency after system failure, it is 
essential to maintain operation order. Hence, the sequence of 
k operations recorded in the journal cannot be hidden. The 
file storage areas, such as the inode tables, disk bucket maps, 
and the disk buckets provide SHI just as in the case of HIFS. 
Hence, once a file system operation is applied to file storage 
and removed from the journal, its timing can no longer be 
identified. 

Apparent paradox: why journaling increases efficiency 

History independence relaxations that come with journaling 
allow significantly more efficient file system operations due 
to batching. This is explained in the following. 

To maintain canonical representations in HIFS, data is 
potentially re-located on each file system write operation. 
The frequency of data re-location increases exponentially with 
the file system load factor. Hence, for higher load factors, 
the number of disk writes for each write request to the file 
system is much greater than the two disk writes required 
for journaling. Further, the same data blocks may be re¬ 
located several times in consecutive write operations. If write 
operations can be batched, then the number of times a data 
block is re-located can be reduced by avoiding redundant 
moves. 

In DAFS, we choose to use the journal not only for failure 
recovery but also as a buffer to batch write operations. Write 
operations are applied to file storage areas only when the 
journal is full. During this process, redundant disk writes are 
eliminated significantly improving write throughputs. 

A. Delete-Agnostic HI (DAHI) 

Regulations [7] that are specifically concerned with irrecov¬ 
erable data erasure and not with other artifacts of history 
can be met by systems that support OAHI for the delete 
operation. We refer to this notion of history of independence as 
delete-agnostic history independence (DAHI). As discussed in 
Section V-A 6 , unlike SHI, OAHI for deletes can be achieved 
without canonical representations. Relaxing the requirement 
to noncanonical representations presents significant efficiency 
benefits. 

To make DAFS preserve DAHI only, we first transform the 
SHI hash table [3] into an DAHI hash table. Then, we use the 
DAHI hash table to organize files’ data and files’ metadata. 

1) DAHI hash table 

The SHI hash table from [3] can be transformed into an DAHI 
hash table as follows : 

The hash table insert operation is modified to not maintain 
canonical representations. Instead, the insert operation uses 
linear probing [9] and inserts a key in the first available bucket. 


The SHI hash table delete operation'^ alone provides DAHI. 
Deletion of a key from the hash table leaves an empty bucket, 
say bucket bi. The delete operation then finds a key that prefers 
bucket bi more than the bucket it is located in according to the 
the gale-shapely stable marriage algorithm, say bucket & 2 - If 
such a key is found it is moved from &2 to bi making 62 empty. 
The process is then repeated for bucket & 2 , and so on, until no 
key is found for relocation. The net effect of this process is 
that a sequence of hash table operations that contains a delete 
operation results in the same hash table state as an insert-only 
sequence hiding all evidence of the delete. 

Theorem 3. DAHI hash table preserves delete-agnostic his¬ 
tory independence. 

Proof. Consider the A history independence game for opera¬ 
tion agnostic history independence for deletes played between 
a challenger C and an adversary A. The ADT considered here 
is the delete-agnostic hash table. A selects two sequence of 
operations : Jq and such that Sq inserts and subsequently 
deletes an element x from the hash table. does not contain 
any delete operations. To ensure indistinguishability between 
the two sequence, the delete agnostic hash table must ensure 
that applying both the sequences of operations to the same 
initial ADT state should result in the same final ADT state. 

Now consider that applying Jq brings the hash table from 
an initial state to a given state s with element x placed at 
position k in the hash table. Also consider that = I{x) 
and 5o[w] = D{x), that is, the operations in sequence Jq 
inserts x into the hash table and the operation deletes x 
from the hash table. 

Consider the elements inserted into the hash table by the 
operations in 5o upto the operation. We can divide these 
elements into three sets as follows 

1) A={y\ 5oW = <j}- 

2) B = {y \ 5o[*] = > jA < tn}. Further, Vt/ € B, 

y cannot be mapped to position k in the hash table using 
linear probing. 

3) C = {y \ = I{y)A > jA < rn}. Further, \/y G C, 

y can be mapped to position k in the hash table using 
linear probing. 

The three sets are constructed in a way such that the 
elements of the sets are sorted on the order in which the 
elements are inserted into the hash table. To illustrate, consider 
a set S' € {A, B, C} and two elements a, 6 G S such that 
Si = a and Sj = b where Sk is the k*^ element of set S. 
Also consider 5o[p] = f(a) Then, the sorted 

property of the sets implies that i < j only if p < q. 

When the delete operation for x is executed in 60 , the 
elements of A and B are not affected due to the design of 
the hash table. Further by construction, once x is deleted, the 
first element from C is placed at position k and all other 
elements already placed in the hash table are remapped (if 
necessary). If C = </>, then nothing is written to k after 
the delete. Let C = {ci, C 2 ,..., } without loss of generality. 
Also let D = {di, d 2 ,...,} be the elements inserted into the 
hash table after x was inserted. Once the delete operation is 

*^For complete listing of SHI hash table operations refer to [17], 
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Fig. 6. TPCC throughputs for Ext3, HIFS, and DAFS with file system load 
factor. Load factor = space utilization. 

executed in Sq, ci will be placed at position k and the elements 
in D will be remapped to positions in the hash table as if x 
was never inserted. Since, (5o does not contain any other delete 
operations, the resulting state after applying the sequence of 
operations on the delete-agnostic hash table is equivalent to 
the resulting state when an insert-only sequence is applied 
which does not insert and subsequently delete x from the 
hash table. Note that the definition of the A function for 
operation agnostic history independence for deletes enforces 
the adversary to select (5i to be exactly such a sequence in step 
1 of the game. Hence, applying (5i to the delete agnostic hash 
table will lead to the same modifications to the hash table as Jq- 
This ensures indistinguishability between the two sequences of 
operations for the adversary and gaurantees that the adversary 
cannot with the operation agnostic history independence game 
for deletes on the delete agnostic hash table with more than 
negligible advantage. 

□ 


HIFS o. 

DAFS .=. 

Ext3 .•. 


2) DAHI in DAFS 

DAFS uses the DAHI hash table for file storage. The DAHI 
hash table insert operation is not required to maintain canon¬ 
ical representations. Since the hash table insert operation is 
used by file system write operation, the overhead of maintain¬ 
ing canonical representations on file writes is eliminated. 

When a file is deleted in DAFS, for each disk bucket 
allocated to the file, the same effect is achieved as that for a key 
deleted from the DAHI hash table. As a result, no evidence of 
a delete remains in the file system state and DAHI is preserved. 

Changing the history independence notion from SHI in 
HIFS to DAHI in DAFS has significant potential for efficiency. 
The number of writes to disk buckets needed for DAHI 
is significantly lower as compared to the number of writes 
needed for SHI. This is because write operations are no longer 
required to maintain canonical representations. As a result, 
when disk buckets are allocated to a file, other files’ data needs 
no relocation. The relocation of data was precisely the reason 
for lower throughputs of HIFS writes. 

B. Experiments 

DAFS implements two new history independence notions, JHI 
and DAHI. Both JHI and DAHI are aimed to increase file 
system efficiency. DAFS can be configured to use DAHI and 
JHI either exclusively or together. If DAFS is configured to use 


TABLE III 

Summary of history independent data structures, a - i — load 

FACTOR, N <— NUMBER OF KEYS, B ■(— BLOCK TRANSFER SIZE. ALSO, I : 
INSERT, L : LOOKUP, D : DELETE, R : RANGE 


Data Structure 

SHI or WHI? 



Runtime 

2-3 Tree [20] 

WHI 

lEEH 

DBIB 

O(logAf) 

Hash Table [11] 

SHI 


■M 

0(log(l/(l - a))) 

Hash Table [3] 

SHI 


DBIB 

Oa/(l-ar) 

Hash Table [2] 

SHI 


DBIB 

I,D 0{logN), S 0(1) 

B-Treaps [21] 

SHI 


BBia 

OiloggN) 

B-SkipList [22] 

SHI 


BBia 

OiloggN) 


both JHI and DAHI, then DAFS uses DAFS journaling IX. In 
this case, if the journal contains an entry for a delete operation 
then the adversary can learn about this delete from the journal. 
Thus, DAFS allows the user to configure the filesystem for 
better performance at the cost of revealing a few deletes to 
the adversary. In this section, we compare the performance of 
DAFS and HIFS. 

All experiments were conducted on servers with 2 Intel Xeon 
Quad-core CPUs at 3.16GHz, 8GB RAM, and kernel v3.13.0- 
24. The storage devices of choice are Hitachi HDS72302 SCSI 
drives. 

DAFS is implemented as a C-f-f based user-space Fuse file 
system. All data structures, including DAHI hash table were 
written from scratch. We tested DAFS on a file system of size 
10 GB and mean database size 1 GB. The experiments were 
conducted on L.IO databases. We used 4 KB disk blocks with 
4 block groups and 512 disk blocks per bucket. The typical 
inode size used was 281 bytes. 

To experiment for a real-world scenario we use the TPCC [19] 
database benchmark. The database of choice is Sqlite. Sqlite 
data files are stored using HIFS, DAFS (without journaling), 
and Ext3. The BenchmarkSQL tool is used to generate the 
TPCC workload. 

Each test run commences with an empty file system and 
creates new databases on file system storage. The number 
of databases is increased until the file system is 90% full. 
The TPCC scale factor is 10 giving a size of 1GB for each 
database. Throughputs are measured at specific load factors, 
ranging from 10% to 90%. 

Eigure 6 reports the throughputs for HIES, DAES, and Ext3. 
As per the TPCC specification, throughputs are reported as 
new order transactions executed per minute (tpmc). As seen, 
the performance of DAPS is up to 4x times better than HIPS 
for load factors >50%. Note that the performance of Ext3 is 
included as a reference. Ext3 does not provide DAHI. 

Por load factors < 50%, HIPS and DAPS exhibit similar 
performance. At lower load factors fewer collisions occur as 
new files are added to file system storage. Pewer collisions 
mean that the frequency of data relocation to maintain canon¬ 
ical representations is low at load factors < 50%. Hence, 
performance of DAPS and HIPS is similar at low load factors. 

X. Related Work 

Existing history independent data structures are summarized 
in Table III. The data structures in Table III assume a re¬ 
writable storage medium. [4] designed a history independent 
solution for a write-once medium The construction is based 
on the observation from [11] that a lexicographic ordering of 
elements in a list is history independent. However, write-once 
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memories do not allow in-place sorting of elements. Instead [4] 
employs copy-over lists [11]. When a new element is inserted, 
a new list is stored while the previous list is erased. This 
requires 0 (ri?) space to store n keys. 

[5] improves on [4] requiring only linear storage. The key 
idea is to store all elements in a global hash table and for 
each entry of the hash table maintain a separate copy-over list 
containing only the colliding elements. 

XI. Conclusions 

In this paper, we took a deep look into history independence 
from both a theoretical and a systems perspective. We explored 
the concepts of abstract data types, machine models, data 
structures and memory representations. We identified the need 
for history independence from the perspective of ADT and 
data structure state transition graphs. Then, we introduced A 
history independence, which serves as a general framework 
to define a broad spectrum of history independence notions 
including strong and weak history independence. We also out¬ 
lined a general recipe for building history independent systems 
and illustrated its use in designing two history independent file 
systems. 
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